System, device and method for automatic anomally detection

ABSTRACT

The invention relates to a method and system for monitoring the behaviour of at least one observable object, e.g. a network element, of a network, wherein at least one parameter of the observable object is repeatedly detected. An actually detected parameter is input to a learning process and to an analyzing process, wherein the learning process forms a reference, based on at least two detected parameter values, for describing the behaviour of the observable object. The analyzing process compares the input parameter and the reference for detecting an anomal behaviour.  
     The parameter preferably is a vector which comprises several values describing properties or functioning of the observable object, and is formed based on events and/or reports from the object.

FIELD AND BACKGROUND OF THE INVENTION

[0001] The invention relates to a system, device and method forautomatic anomaly detection, in particular to the automatic detection ofquality indicators updated in real-time.

[0002] Some networks may provide a network infrastructure for operatorsfor offering services to the subscribers. Because network infrastructureis very complex and may be affected by the environment, problems mayarise which decrease the quality of service experienced by thesubscribers. If such problems are detected and solved quickly andefficiently, the quality of service may be kept at a very high level.

[0003] The expectations of customers regarding access to services overthe Internet are becoming more demanding, and response times for accessto critical data are getting more important. As a result, efficientreal-time support over networks will be critical for the continuedgrowth of the Internet and intranets. This support for real-timeservices requires “Quality of Service” management procedures in mobilenetworks so that the scarce spectrum can be used as efficiently aspossible.

[0004] The significant growth of networks including an increased numberof different elements requires sophisticated methods and tools thatenable centralized network and service monitoring in large networks soas to provide effective network operation.

[0005] Mechanisms for detecting abnormal situations belong to one of twomajor categories, namely rule-based detection mechanisms and anomalydetection mechanisms (sometimes called also novelty detectionmechanisms). Rule-based detection mechanisms attempt to recognizecertain behaviour patterns which are known to be improper likeexceedings of given thresholds. Thus, rule-based detection mechanismshave two severe limitations: they can only detect problems which haveoccurred before and which have been explicitly taught to the detectionsystem or programmed into it. Anomaly detection systems (ADS), as usedin this application, reverse the detection problem: they are taught whatnormal behaviour is, and anything deviating significantly (by apredetermined margin) from the norm is considered anomalous. ADSmechanisms are capable of detecting potentially problematic situationswithout explicit training of such situations. An example of an ADS isdisclosed in the article: Höglund, Albert: An Anomaly Detection Systemfor Computer Networks, Master of Science thesis, Helsinki University ofTechnology 1997. Thus an ADS is defined as a mechanism which is trainedwith normal behaviour of the target system. Accordingly, an ADS flagsevery significant deviation from normal as a potential anomaly. Incontrast, a rule-based detection system is trained with known modes ofabnormal behaviour and it can only detect the problems that have beentaught to it.

[0006] Generally it is difficult to have alarms indicating quality ofservice problems. It is also very challenging to define properthresholds which generate appropriate numbers of alarms. If the alarmthresholds are too high, there are no notifications about problems. Ifthe alarm thresholds are too low, there are too many alarms to behandled efficiently. If the alarm thresholds are updated manually, theupdating is very cumbersome and must be performed whenever the networkconditions change. Further, alarm thresholds are normally different indifferent parts of the network which leads to additional problems.

[0007] Usually the operators are not able to freely define KeyPerformance Indicators (KPIs) which are monitored. The KPIs are definedby network manufacturer and the operator can only select whether or notto use a KPI. In systems which monitor predefined KPIs of a networkelement, the operator may be able to define alarm thresholds for theKPIs manually. In such cases, it is only possible to monitor the mostimportant issues on a general level. Furthermore, the adjusting of alarmthresholds is very difficult.

[0008] With an ever-increasing alarm flow it is vital that the networkoperator has means to cut down the number of less important alarms andwarnings. In this way the operating personnel can concentrate onservice-critical alarms that need to be dealt with immediately.

[0009] When simply relying on the counting the number oferror-indicating events, and issuing an alarm when the number of eventsexceeds some user determined value, there may be some situations wherethis solution does not function properly. For example, in front ofHelsinki there are some islands with a single base station on them.Boats with several hundreds of passengers will bypass the islands everynow and then, and naturally the base station on those islands may bevery highly loaded by the mobile subscribers on the boat. When thebreaking of calls is counted for causing alarms, such alarms will befalse, because the calls are broken by natural phenomena, i.e. thebypassing ship that is moving out of the coverage area of the mobilenetwork, and not by any network malfunction. However, in some other basestation a similar course of events might indicate some severe networkproblem.

[0010] The article: Höglund, Albert: An Anomaly Detection System forComputer Networks, Master of Science thesis, Helsinki University ofTechnology 1997, discloses an ADS for a Unix-based computer system. Thedisclosure contents of this article are in toto incorporated herein byreference. The disclosed system consists of a data-gathering component,a user-behaviour visualization component, an automatic anomaly detectioncomponent and a user interface. The system reduces the amount of datanecessary for anomaly detection by selecting a set of features whichcharacterize user behaviour in the system. The automatic anomalydetection component approximates users' daily profiles withself-organizing maps (SOM), originally created by Teuvo Kohonen. Acrucial parameter of a SOM is a Best Mapping Unit (BMU) distance. TheBMUs of the SOMs are used to detect deviations from the daily profiles.A measure of such deviations is expressed as an anomaly P-value.According to reference 1, the ADS has been tested and found capable ofdetecting a wide range of anomalous behaviour.

[0011] U.S. Pat. No. 5,365,514 discloses an event-driven interface for asystem for monitoring and controlling a data communications network. Thedevice is listening to serial data flow in a LAN (Local Area Network),and provides a control vector. The device is not structured to receiveand analyse packets of a packet flow.

SUMMARY OF THE INVENTION

[0012] The invention provides a system and method for automatic anomalydetection, in particular for the automatic detection of anomalies inquality indicators updated in real-time.

[0013] The present invention provides a system, method and/or device asdefined in any of the independent claims or any of the dependent claims.

[0014] According to one aspect of the invention, a method and system areprovided for monitoring the behaviour of at least one observable objectof a network, wherein

[0015] at least one parameter of the observable object is repeatedlydetected,

[0016] at least one parameter is checked with regard to fulfillingpredetermined criteria,

[0017] a vector is formed based on the monitored parameter depending onthe result of the checking step,

[0018] and the formed vector is evaluated for monitoring the behaviourof the observable object.

[0019] The observable object can be any monitored entity in the network,e.g. a network element, a subscriber, subscriber group, geographicalarea, circuit group, service, or alike, that can be identified andreferred to in predetermined criteria that are used in the forming ofthe vector. The vector comprises several values which describeproperties or functioning of the observable object. The predeterminedcriteria and the observation period, during which one vector is formed,are user definable.

[0020] The formed vector is preferably input to a learning process andto an analyzing process,

[0021] the learning process is forming a reference, based on the inputvector and a previous value of the reference or at least one previousinput vector, for describing the behaviour of the observable object, and

[0022] the analyzing process is comparing the input vector and thereference for detecting anomalous behaviour.

[0023] The vector is preferably formed based on detected values in RTT(Real Time Traffic) reports. An RTT report contains fields definingparameters of phenomenons or events in the network e.g. for calls it caninclude the reason code for call break, a length of call, and/or anumber of handovers during call to name a few.

[0024] The reference formed by the learning process may be a profilevector generated from at least two vectors. In another case, e.g. in aSOM (Self-Organizing Map) case, the profile is not a vector but is madeup e.g. of the trained SOM itself and the Best Mapping Unit DistanceDistribution (BMUDD).

[0025] Key Performance Indicators (KPIs) which are computed andmonitored measure the quality of service seen by the subscribers. Thenumber of parameters that fulfil the predetermined criteria during anobservation period, can be used for forming these KPI values which formpart of the vector, each parameter having its own criteria. E.g. when anRTT value fulfils the predetermined criteria, the value of one KPI isincreased by one. The fulfilling of the predetermined criteria can bechecked by comparing fields of an RTT report or values derived from themby given functions to predetermined field tresholds.

[0026] One condition can e.g. be

[0027] If (reason code=4 (or 5 or 6)) and (length of call>4 minutes) and(number of handovers=0) the corresponding KPI value would indicate, howmany breaking calls that fulfil that condition, happened during theobservation period. These KPI values are then put together to form avector for the learning and analyzing processes.

[0028] In accordance with one aspect of the invention, the inventionprovides a system for collecting data from an observable object, e.g. anetwork element, in real-time and for automatically detecting, from thisdata, anomalies in the quality of service in comparison to the normalbehaviour of the observable object. Anomalies are detected based onuser-definable key performance indicators (KPIs) of the observableobject. KPIs measure the quality of service seen by the subscribers.Detected anomalies indicate potential problems in the quality ofservice.

[0029] The system can be connected to any network element which sends areport to the system in real-time about an event, such as a call attemptor a short message, which occurred in the observable object or due tosome action of an observable object. Reports are defined inconfiguration files enabling the adaptation of the system to anyobservable object. Various data are included in the reports. The reportspreferably contain fields about subscriber information, event details,used network resources, used additional services and quality indicators.These fields can be used to define KPIs by the user.

[0030] The system learns the normal values of the KPI vectors and isthus capable of detecting and indicating an abnormal behaviour of theobservable object when the actual KPI vectors deviate significantly fromthe learned KPI vectors.

[0031] These indications can be interpreted as alarms about potentialproblems. In a preferred implementation, the system not only learns thenormal values of the KPIs, but also, in what combinations they occur. Aset of perfectly normal KPI values can then represent an anomaly, if thevalues occur in an unusual combination.

[0032] Advantages of the invention include the following: provision of aflexible solution for any observable object. Further, the solution isobservable object independent. The KPIs can be defined by the users.User definable KPIs are easy to introduce and monitor. The KPIs arepreferably updated in real-time. In practise, KPI updating intervals,i.e. monitoring intervals of more than e.g. 5 or 10 seconds, preferablyat least 30 seconds or more are sufficient. The amount of work isreduced because there is no need to define and iterate alarm thresholds,and no need for maintenance after the KPIs are defined. The providedsystem and method automatically adapt to different situations as normalsituations in case of slowly changing normal situations.

[0033] The invention provides means and functions for monitoringapplications. Real-time network surveillance and cost-efficientoperations are possible at both the regional and global level. Alarmfiltering applications help to reduce operator workload by adjusting thenetwork information flow to an optimal level. The invention may providealarm manuals and alarm history information to assist in troubleshootingnetwork problems.

[0034] The invention allows efficient problem detection and workingprocedures. By means of centralised monitoring tasks, automation andintegration of other management systems is possible. The workload andcosts of managing the network are reduced as a result of improvednetwork quality through rapid problem detection and correction. Fewersite visits are necessary due to centralised operations.

[0035] The invention provides adjustable alarm sensitivity. Features ofa preferred embodiment include: receiving traffic reports fromobservable objects, e.g. network elements; using reports in counting keyperformance indicators (KPI); forming vectors of KPIs; and using thevectors as input data to an algorithm that learns the normal functioningof the observable objects and detects anomalies.

BRIEF DESCRIPTION OF THE DRAWINGS

[0036]FIG. 1 illustrates a basic structure of a communication system inaccordance with an embodiment of the invention;

[0037]FIG. 2 shows the structure of another embodiment of the invention;

[0038]FIG. 3 shows the steps of another embodiment of the invention;

[0039]FIG. 4 shows a self-organizing map;

[0040]FIG. 5 is a variation of FIG. 4, with circles centred around theneurons of the SOM;

[0041]FIG. 6 is a process chart illustrating a preferred embodiment; and

[0042]FIGS. 7A to 7C illustrate different presentations of time.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS OF THE INVENTION

[0043] Basically, the invention provides a configurable system andmethod for collecting data from various observable objects, e.g. networkelements, part of them, users, user groups or services etc., anddetecting anomalies in the QoS automatically, by comparing the data todata that has been collected when the network is working normally.Anomalies are detected based on user definable “key performanceindicators” (KPI) of the observable object. The system learns the normalvalues of the KPI vectors and is thus capable of indicating the abnormalbehaviour of the network. The system alarms for anomalies when thedeviation is strong enough compared to normal behaviour. A benefit isthat the user does not need to enter absolute threshold values.

[0044] According to a preferred embodiment, the invention provides, oris used in, a system that monitors the functioning of a communicationnetwork, e.g. a mobile network, and detects anomalies. The idea is thatthe system learns the normal functioning of the network, and is thusable to detect what is not normal.

[0045] In one of the preferred implementations of the invention, thesystem or process may include the following steps:

[0046] Receiving of RTT reports (RTT=Real Time Traffic report,preferably delivered as a UDP message) from at least one networkelement. An RTT report indicates some events associated with theobservable objects, and it consists of several fields, describingdifferent attributes of the event (eg. event code, reason code, timeinformation, subscriber identification). There can be an unlimitednumber of fields in the RTT report, e.g. a report from a MobileSwitching Center (MSC) has about 100 fields;

[0047] RTT reports will be used in counting KPI (Key PerformanceIndicator) values. The KPI value, or one of the KPI values, may be thenumber of RTT reports in a given time period, that fulfil somepredetermined condition, e.g. conditions for field values.

[0048] In one KPI value, there may be several field value conditionswith some logical function (ANDed or ORed). e.g.

KPI(1)=number of (RTT(field1)=value1 and value2<RTT(field2)<value3),

[0049] or,

KPI(2)=number of (RTT(field1)>value1).

[0050] Forming of vectors, by using the KPI values as vector components.The vector describes the functioning of the observable object.

[0051] Using the vectors as input data to an algorithm. In thisinvention, the algoritm preferably is a neuron (neural) networkalgorithm such as SOM (self organising map) or a clustering algorithmsuch as the K-means algorithm or some other corresponding neural orstatistical algorithm.

[0052] The algorithm will learn the normal functioning of the network,and the algorithm will form a profile describing the normal functioningof the network. In a preferred embodiment the profile consists of nodesor cluster centroids.

[0053] Finding best mapping nodes or cluster centroids for incomingvectors;

[0054] Counting distance between new incoming vectors and the bestmapping node or cluster centroid.

[0055] Counting a distribution for distances;

[0056] Finding out whether there is a probability for an incomingdistance less than predetermined value;

[0057] Alarming, if answer is yes.

[0058] According to one aspect of one of the preferred embodiments ofthe invention, it includes the steps of forming vectors of counted KPIvalues, and inputting those vectors to a neural algorithm or the like.

[0059] An RTT-report (‘Real Time Traffic’ report) is a structured reportof an event in the network. RTT reports are binary data recordsconsisting of fixed-width fields in a predefined order. The structure ofthe report is definable e.g. by a user. The report contains fields likesubscriber number, reporting time, time of A-call, time of B-answer,call ending type (DX Cause), etc. A network element (NE) sends anRTT-report to a system after a certain event. In a system, e.g. aTraffica system, KPIs are updated based on the values of differentfields in the RTT report. Note that the system can combine data fromdifferent NEs which means that one can have network-wide or system-wideKPIs.

[0060] In GSM networks, an RTT report can be generated after each call,short message, handover, location update etc. RTT reports for callscontains fields such as:

[0061] start time of the call,

[0062] end time of the call,

[0063] identification of A and B subscriber (MSISDN, IMSI),

[0064] location information of A and B subscriber (cell, lac),

[0065] used network resources (BSC, circuit group),

[0066] used mobile equipment (IMEI),

[0067] used services,

[0068] clear code (DX cause)˜the reason code for call ending,

[0069] etc.

[0070] In the following, an example of an RTT report definition islisted.

[0071] Name of a field, offset, data type+length

[0072] Sub_Id 0 UINT16

[0073] Report_Date 2 BCDDATE

[0074] Report_Time 6 BCDTIME

[0075] A_Caller_Id 10 UINT32

[0076] B_Caller_Id 14 UINT32

[0077] B_Answered_Date 18 BCDDATE

[0078] B_Answered_Time 22 BCDTIME

[0079] Dx_Cause 26 UINT16

[0080] and so on.

[0081] In GPRS networks, an RTT report can be generated after eachattach/detach, PDP context activation/deactivation etc. On the systemside, e.g. Traffica side, the structure of RTT report is preferablydefined in ASCII files, meaning that defining new RTT report types isvery simple and flexible.

[0082] The RTT-reports can also be replaced by any (semi) structuredmessages, e.g. alarms, that are describing network operations or eventsand that can be handled by a feature extractor. Furthermore they can besent not only after certain events, but also after certain fixed timeperiod or after a fixed number of certain events.

[0083] KPI (Key Performance Indicator) values are functions of one ormore RTT reports' field values and/or other KPIs. In the simplest casethey are counters. For counting the KPI values using the RTT reports, aKPI value may be simply increased by one, if a user definable conditionis true for the analysed RTT report. The user definable condition is acomparison between some fields of RTT report data with predeterminedvalues.

[0084] The KPI value can also be, in some of the embodiments, a functionof at least one field values and/or other KPIs. For example, if thefield value already counts something like the usage of a resource X likee.g. the amount of used time slots or seconds of used processor time,the KPI value can be a cumulative sum of field value in given timeperiod. Furthermore, the field does not necessarily have to be adiscrete count of something, it can be any kind of quantity measure.

[0085] Vectors are formed by using the KPI values as vector componentssuch that the vector describes the functioning of the observable object,e.g. the NE, a part of a NE or a subscriber group. It is e.g. possibleto attach each calculated KPI value as one component to the vector. Incase one or some KPI values cause bad behavior in the sum model, suchKPI values can be dropped from the vector. In principle there are norestrictions. One can select those KPIs that best describe thephenomenons one wants to monitor.

[0086] Vectors are formed because the used algorithm uses vectors asinput data. The algorithm may be e.g. a self organising map (SOM)algorithm. A SOM algorithm is used in other network optimizing tasks aswell and is a neural network algorithm.

[0087] Instead of a SOM based anomaly detection method, other methodsmay be used as well. These include, for example, K-means clustering,which resembles closely the used method, or other neural or statisticalmethods, e.g. coming from statistics in field of multivariate analysis,etc., provided that the methods are able to find anomalies (some timescalled also novelties) or sudden changes in multivariate time series.

[0088] There are several applicable distance measures for counting thedistance between new incoming vectors and the best mapping nodedepending on the embodiment. The distance measure can be e.g.Kulback—Leibler distance for histograms, or in some other embodiment theso called city block distance, or whatever norm is best suited for theembodiment.

[0089] The counting of a distribution for distances is preferably donein discrete steps. All the incoming distances are recorded in a table orsuch, and it is counted how many distances occur at each discrete area.E.g. how many distances have been recorded between the range a . . . b,b . . . c, c . . . d, d . . . e, and how many recorded distances thereare in total. Then, the distribution can be calculated by dividing.

[0090] The distribution is a function that tells a probability for anincoming distance being less than a predetermined value directly. Bygiving an incoming distance as input to the distribution, the output ishow probable it is that a distance of given length occurs.

[0091]FIG. 1 shows the system architecture of a structure representingan embodiment of the invention. The embodiment can be used in acommunication system or network, e.g. of GPRS (General Packet RadioService) or GSM structure such as a public land mobile network (PLMN).Mobile or fixed user terminals, e.g. GSM mobile station (MS) are able toreceive services from, or communicate with other network elements.

[0092]FIG. 1 shows components of a first embodiment and theirinteraction. The components include one or more observable objects (OO),e.g. network elements (NE) 1. The NE 1 can be of any type that reportsabout its performance by some kind of reports. A device or function 3,e.g. a storage, includes KPI definitions, i.e. definitions of generatedKPIs that are created based on the information sent by Network Element1. The KPI definitions can be common to all network elements of the sametype, or can be specific for each element. The KPI definitions may bechangeable on the fly, e.g. immediately when an operator changes thedefinitions, and can be given either as rules or formulas or in anyother format that can be executed by a computer.

[0093] A KPI extractor 2 parses incoming messages from the NE 1 andeventually further NEs, and creates KPI vectors that describe thebehavior of NE 1 or each NE. The KPI extractor 2 works on-line in realtime, and delivers KPI vectors to a profile adapter 4 and anomalydetector 5 when defined in KPI definitions of means 3.

[0094] The profile adapter 4 creates and updates the behaviordescriptor, i.e. NE profile, for each monitored NE 1 or each NE type.The update can be done in real time either after each generation orreceipt of a KPI vector, or periodically after a given time interval orafter a given number of received KPI vectors. The form of generatedprofile depends on the used anomaly detection method.

[0095] The anomaly detector 5 compares the most recent KPI vector of KPIextractor 2 to the behavior profile, i.e. NE profile received fromprofile adapter 4, and detects differences of the KPI vector incomparison to the NE profile. The detector 5 gets the profile of each NEor NE type from the profile adapter 4 either by request or periodicallyor whenever the adapter 4 is ready to submit such a profile. This can beimplemented with one of several well-known anomaly detection methods.

[0096] The Anomaly detector 5 sends a report, i.e. Anomaly Informationreport, to a monitoring/reporting device 6 whenever it detects aninteresting deviation in network element(s) behavior in comparison tothe profile of the NE 1. This monitoring/reporting device can be eithera dedicated monitoring application in a computer or an SMS- orHTTP-based application in a mobile device.

[0097] The components 2 to 6 can be implemented as processes in one orseveral separate computing devices or as specific circuits that areintegrated with the network monitoring devices.

[0098] One example of the structure of an implementation architecture ofthe invention is shown in FIG. 2. As shown in FIG. 2, a network element(NE) 10 sends RTT reports, e.g. as UDP (User Datagram Protocol)messages, to a network element TNES 11 which redirects such reportmessages to a network element TCAD 12. The TCAD 12 includes the means orfunctions 2, 4, 5 of FIG. 1, i.e. provides a KPI Extractor, ProfileAdapter, and Anomaly Detector function/means. The TCAD 12 providesanomaly information reports and heartbeat reports and sends thesereports to a network element TS 13. The TS 13 registers such reports andissues alarms in case of need.

[0099] The embodiments shown in FIGS. 1, 2 provide a monitoring solutionand contain means for collecting and storing real-time information fromthe network to detect faults in network elements and to monitor thequality of service provided by the network. This provides visibility tothe network status in real-time at any time and anywhere.

[0100]FIG. 3 illustrates method steps of an embodiment of the presentinvention. As shown by the arrow “Flow of RTT reports”, the monitorednetwork element sends RTT reports to a network component or anotherentity, e.g. CCMA. The CCMA (Clear Code Matrix) is a component of theTraffica product and it is responsible for calculating the KPI valuesfrom the RTT reports. It is a user-defined decision tree which consistsof nodes and counters. Counters can be KPIs as such or KPIs can becalculated from two or more counters. There, in a step 1., user definedKPIs are updated e.g. at the counter tree, e.g. the CCMA according tothe received RTT reports. With an appropriate timing, e.g. every 5thminute, a vector consisting of KPI values, will be output to thelearning- and analyzing processes. As shown in FIG. 3, each outputvector may be a sequence of KPI values, e.g. “(37, 15, 0, 3, 1)”. Thisforming of a KPI vector for the or each monitored network element isdepicted by step 2 of FIG. 3.

[0101] Each KPI vector is input to a learning process, step 3, as wellas to an analyzing process, step 4. In the learning process, thereceived KPI vectors are used in counting periodically a profile for themonitored network element(s). The profile represents the functioning ofthe network element. When using the SOM algorithm, the profile is thetrained SOM itself. The SOM consists then of k nodes (neurons). Eachnode is a vector of the same dimension as the KPI vectors. The number ofnodes (k) ought to be dependent on the number of KPI vectors (n) in thetraining data. It could be e.g. desirable to have approximately n=20*k.When using SOM or K-means the profile consists of as many vectors asnodes or neurons and the distance distribution.

[0102] In the analyzing process, step 4., any new KPI vector receivedfrom step 2, will be compared to the profile formed in step 3 based onthe previous KPI vectors, so as to detect any surprising deviationstherebetween which might indicate an alarm situation. Preferably, ananomaly P-value will be calculated for each new incoming vector. Thevalue of P value can range e.g. between

[0103]0 and

[0104]100. The closer to

[0105]0 the value is, the more the new vector differs from the profile.In addition, the process will count which vector components will differfrom the profile the most.

[0106] In the example shown in FIG. 3, the calculated P-value is 3, 67.An anomaly threshold is set for comparison to the P-value whichthreshold may have the value of 5,0. In step 5., the P-value will becompared to the predetermined anomaly threshold. If the P-value is lowerthan the threshold, an anomaly indication will be generated, asindicated by the arrow shown in FIG. 3.

[0107] In step 6., an alarm will be generated, if the anomaly indicationfulfils the conditions determined by the user.

[0108] The anomaly indication sent as a result of a positive comparisonof step 5, includes information about the actual P-value leading to thealarm, and about which e.g. three components of the actual KPI vectordiffered the most from the profile. The mentioned number of three isjust an example, the number of components reported to be differing themost from the profile can range from zero up to the number of componentsin the KPI vectors.

[0109] The invention can be implemented to support also othertechnologies than GSM. The adaptive monitoring and reporting can beimplemented e.g. in a GPRS NE, e.g. a support node such as SGSN (Trafficfor GPRS) or in CPS & MSS (Traffic for

[0110]3G and All-IP) network elements (CPS=Call Processing Subsystem;MSS=Management Statistic Subsystem). The invention will support theseadaptation layers as well.

[0111] The second embodiment of the invention described below and shownin FIGS. 4 to 7 relates to anomaly detection in a computer or intelecommunication networks in which the concept of normal behaviourvaries with time. The details of this embodiment as described below andshown in FIGS. 4 to 7 can be arbitrarily combined with the details ofthe above discussed embodiments. More particularly, this embodimentrelates especially to teaching an anomaly detection mechanism. Anexample of such an anomaly detection mechanism is based onself-organizing maps (SOM).

[0112] A problem with known SOM-based ADS mechanisms is that they arerestricted to detecting problems in systems having a well-defined normalbehaviour. In most telecommunication networks the concept of “normalbehaviour” is, at best, vague. A network element's behaviour at peaktime is very different from its behaviour at the quiet hours just beforedawn. More precisely, most often it is the users' who cause thevariation in what is called normal. In other words, known ADS mechanismsdo not readily lend themselves to detecting problems in systems orelements whose normal behaviour varies with time.

[0113] Accordingly, this embodiment of the invention provides amechanism for teaching ADS mechanisms which rely on the concept ofnormal behaviour in a system in which the normal behaviour variessignificantly with time. In this context, “significantly” means that abehaviour which is normal at certain times is to be considered anomalousat other times.

[0114] This aspect of the invention is partially based on the idea thattime is used as a component of the input data to the ADS. But it is notsufficient to include time in the input data, if time is represented asa quantity which increases linearly from a fixed start point. This isbecause such a presentation of time is not repeating, and the ADS wouldnot know when a certain behaviour was normal and when anomalous. It isalso not sufficient to introduce time as a periodic quantity (such as a24-hour clock) because the daily jumps from 23:59 to 00:00 wouldintroduce severe discontinuities to the input data.

[0115] Accordingly, the embodiment is also based on formulating apresentation of time which is suitable for solving the problem caused bythe time-varying normal behaviour of systems such as telecommunicationnetworks. According to this aspect of the invention, the presentation oftime which is used as a component of the input data is 1) periodic, 2)continuous and 3) unambiguous (within the period of the input data). Apreferred example of such a presentation of time (t) is a projection tox and y components such that x=sin(2πt/L) and y=cos(2πt/L) where L isthe length of the period of variation, typically 24 hours or a week. Atfirst sight, such a two-dimensional presentation of time would seem touse both dimensions of a two-dimensional SOM map, but such SOM maps arefor visualization purposes only, and inside a computer memory, an SOMmap can have an arbitrary number of dimensions.

[0116] The continuity requirement for the presentation of time should beinterpreted with the constraints of reality in mind, however. Alldigital systems have a finite resolution, which means that nopresentation of time can be perfectly continuous. In addition, somememory can be saved when storing the observations by omitting some ofthe least significant bits of the observations, ie by quantization. Forthe purposes of the invention, a presentation of time is sufficientlycontinuous (=“large-scale continuous”) if it does not containdiscontinuities which are large enough to affect a decision betweennormal and anomalous behaviour. For example, in a telecommunicationnetwork with a usage period of 24 hours, discontinuities (quantizations)of up to about 10 or 15 minutes may be considered acceptable if thereare no times at which user behaviour changes so fast that a certain typeof behaviour is considered normal at a certain point of time butanomalous 10 or 15 minutes later. In contrast, the presentation of timefor a system which opens and closes (or radically changes its behaviourin other ways) at well-defined times must have considerably smallerdiscontinuities.

[0117] Some memory can be saved if it is known beforehand that changesin the behaviour of the observable objects are small and/or gradualduring certain parts of the period (such as nights) and more pronouncedduring other parts (such as days). In such a case, the presentation oftime can be such that the resolution is variable within the period. Thismeans that one bit may represent, say, 30 minutes during the quiet partsof the period and 5-15 during the more active parts of the period.

[0118] In some cases a single period (typically 24 hours) is sufficient,but sometimes two or three nested periods may be required. For example,the presentation of time may comprise one component with a 24-hourperiod and another with a one-week period. For locations or situationsstrongly affected by seasonal changes, a third component with a one-yearperiod may be required.

[0119] This aspect of the invention is not limited to self-organizingmaps but can be used with other clustering techniques such as k-means orother corresponding neural or statistical algorithm..

[0120] According to a preferred embodiment of the invention, allvariables (components of the input data), including the presentation oftime, are scaled such that the variance of each variable is the same,preferably one.

[0121] The invention can be implemented as software routines in acomputer system having access to the objects to be observed.

[0122]FIG. 4 shows a self-organizing map;

[0123]FIG. 5 is a variation of FIG. 4, with circles centred around theneurons of the SOM;

[0124]FIG. 6 is a process chart illustrating the second embodiment; and

[0125]FIGS. 7A to 7C illustrate different presentations of time.

[0126] The following embodiments of the invention will be described inconnection with self-organizing map (SOM) technology. FIG. 4 shows aself-organizing map. The objective with a SOM test for anomaly is totest if the current behaviour of an observable object is anomalous ornot. The hypothesis to be tested is:

[0127] H₀: The most recent observation is not anomalous.

[0128] H₁: The most recent observation is anomalous.

[0129] The behaviour of an observable object can be very consistent,which means that it is concentrated to one or a couple of regions in thefeature space. On the other hand, the behaviour can also be morescattered in the feature space, which would signify a more irregularbehaviour. The idea of the SOM test for anomaly is to approximate thenormal behaviour of an observable object with a small object-specificSOM. The previous behaviour is assumed to represent the normal behaviourof the observable object. Anomalous observations can be omitted from theprevious behaviour when training the SOM.

[0130] The SOM shown in FIG. 4 is a one-dimensional (8*1) SOM with 200points of artificial data, commonly depicted by reference number 23.FIG. 5 shows the same SOM with circles or ellipses 31 plotted using theneurons 24 of the SOM as centres. For clarity, FIGS. 4 and 5 are shownwith only two features 21 and 22, but in reality, the number ofobservable features can be much larger than two.

[0131] 200 points of artificial data for two features have been plottedin the plane together with the neurons of a map of size 8*1 trained withthe data. The one-dimensional SOM approximates two clusters (having fourellipses 31 each) of data quite well. Note that the data in FIG. 4 istwo-dimensional to allow visualization to humans. In a computer system,the number of dimensions can be much larger than two. The Best MappingUnit (BMU) for a data point f_(k) in an SOM is the neuron w_(i) havingthe smallest distance to the data point. This is expressed in equation(1), where dist stands for the distance.

BMU=arg min{dist(f _(k) ,w _(i))}  (1)

[0132] Here, we assume that a Euclidean distance to the BMU is used tomeasure how much an observation deviates from the normal object-specificbehaviour, but other types of distance measurements can be used. Theanomaly P-value is a measure of the degree of anomaly for anobservation. On the basis of this value, the hypothesis H₀ is acceptedor rejected. Calculation of the anomaly P-value will be described inconnection with the use phase of the SOM-based ADS.

[0133] An ADS mechanism involves three major phases, design, teachingand use. The design phase typically involves some decisions andcomprises the following steps:

[0134] 1. Selecting a set of features describing the target object. Thefeature vector describing the object is denoted by f. (The target objectis the object to be observed, such as a network element.) This step isdescribed in detail in the above referenced article. For the purposes ofthe present aspect of the invention, it suffices to say that thefeatures are parameters which can be used to make a distinction betweennormal and anomalous behaviour.

[0135] 2. Formulating a hypothesis for detecting anomalous behaviour.The objective is to test the most recent observation f_(n+1) foranomaly. The hypothesis to be tested is H₀: The most recent observationf_(n+1) is not anomalous. The alternative hypothesis is H₁: The mostrecent observation f_(n+1) is anomalous. (The suffix n will be describedin connection with the use phase.)

[0136] The teaching (learning) phase typically comprises the followingsteps:

[0137] 1. Observing normal behaviour of the target object. For example,n measurements (f₁, f₂, . . . , f_(n)) of the feature vector arecollected.

[0138] 2. Training an SOM with m neurons using the measurements (f₁, f₂,. . . , f_(n)) as training data. The number of neurons in the map, m, isselected to be much smaller than n, for example n/10.

[0139] The use phase typically comprises the following steps:

[0140] 1. Omitting neurons in the SOM that are not Best Mapping Units(BMU) for any of the data points (f₁, f₂, . . . , f_(n)).

[0141] 2. Calculating the BMU distances for (f₁, f₂, . . . , f_(n)) fromthe trained SOM. These distances are denoted by (D₁, D₂, . . . , D_(n)).

[0142] 3. Calculating the BMU distance for the observation f_(n+1). Thisdistance is denoted by D_(n+1).

[0143] 4. Calculating the anomaly P-value. Let B be the number of theBest Mapping Unit distances (D₁, D₂, . . . , D_(n)) higher than D_(n+1).The anomaly P-value for a certain object is then calculated from:$\begin{matrix}{P_{n + 1} = \frac{B}{n}} & (2)\end{matrix}$

[0144] 5. Accepting or rejecting the null hypothesis on the basis of theanomaly P-value. If the anomaly P-value is higher than the anomalyP-value threshold, the null hypothesis H₀ is accepted (the most recentobservation is considered normal). If, on the other hand, the anomalyP-value is smaller than the anomaly P-value threshold, the nullhypothesis H₀ is rejected and the most recent data point is assumedanomalous.

[0145] If the test indicates that the object behaviour is anomalous (Hois rejected), the k most significantly deviating features can bedetermined. The k features (components of the feature vector) with thebiggest absolute contribution to the BMU distance are the k mostsignificantly deviating features. Equation (3) shows how the mostdeviating feature can be calculated. This component of the featurevector is given the sub-index md in equation (3). In equation (3) BMUstands for the Best Mapping Unit of the feature vector f_(n+1), and jtakes values from zero to the number of features. The other k−1 mostdeviating features are calculated in a corresponding manner.$\begin{matrix}{f_{{n + 1},{md}} = {\underset{j}{\arg \quad \max}\left\{ {{abs}\left( {f_{{n + 1},j} - {BMU}_{j}} \right)} \right\}}} & (3)\end{matrix}$

[0146] The situation shown in FIG. 4 can be used as an example. FIG. 4shows two anomalies, commonly depicted with reference numeral 25. Theanomaly P-value for anomaly 1 is 0/200=0. Since none of the BMUdistances for the data points have a BMU distance greater than that ofanomaly 1, the value of the numerator is zero. Correspondingly, theanomaly P-value for anomaly 2 is 7/200=0.035.

[0147] If the Anomaly P-value is smaller than the Anomaly P-valuethreshold, the null hypothesis H₀ is rejected and an alarm is triggered.The Anomaly P-value threshold can be interpreted as the fraction ofobservations that will be rejected if the behaviour of the observableobject does not deviate from the the same observable object's earlierbehaviour which was used during the teaching phase. That is, if the nullhypothesis is true:

number of alarms=P-value threshold*observations   (4)

[0148] On the other hand, if the null hypothesis is not true (the newdata is anomalous), the number of rejections (alarms) is higher.

[0149]FIG. 5 shows how a selected P-value threshold can be illustratedfor observable object i using d-dimensional spheres (d-spheres) centredat the neurons of the object-specific map. With two-dimensional inputdata, the d-spheres are circles. Here d stands for the number ofdimensions in the input data (f₁, f₂, . . . , f_(n)). In other words,each input data element f₁ through f_(n) is itself a vector with ddimensions. The number of observations for object i falling outside thespheres corresponds to the numerator B in equation (2). Thetwo-dimensional example in FIG. 5 shows such a situation. Here B is 13,which corresponds to quite high a P-value threshold of about 6.50.

[0150]FIG. 6 is a process chart illustrating a the second embodiment ofthe invention. Reference number 302 points to an element of a physicalsystem such as a telecommunication network (as distinguished from aneural network). A physical element may comprise several observableobjects. For example, if the physical system element 302 is atelecommunication exchange, its observable objects may comprisethroughput, waiting time, number (or percentage) of failed calls and thelike. For each unit of time, an indicator collector 306 collects anindicator tuple 304. The tuples are stored in an indicator database 310.Reference 312 points to a data set used for training the neural network(or another learning mechanism) 314. The data set 312 should indicatenormal behaviour of the physical element 302. A storage 318 containstrained neural networks. When a physical element 302 is to be observed,the corresponding trained neural network 320 is retrieved from thestorage 318 and applied as one input to the anomaly detection mechanism322. The anomaly detection mechanism's other input is the indicator set324 to be tested for anomalous behaviour. If the anomaly detectionmechanism 322 decides that the behaviour described by the indicator set324 is anomalous, the anomaly P-value and the most deviating indicators326 are stored in an anomaly history database 328. At the same time, analarm 330 is given to a monitoring device 332, such as a computerscreen.

[0151]FIGS. 7A to 7C illustrate different presentations of time, some ofwhich are acceptable and some unacceptable. In FIG. 7A, the horizontalaxis is the time in units of L where L is the period of input data,which is assumed to be 24 hours. Line 400 shows a straight presentationof time. References 401 to 403 point to three instances of a repeatingevent which occurs at 24-hour intervals. A problem with thispresentation of time is that the presentations of the times aredifferent, and the ADS cannot recognize events 401 to 403 as a recurringevent.

[0152] The saw-tooth line 405 is a 24-hour presentation of time, or inother words, a modulo function of time. In this presentation, eventsoccurring at the same time each day have identical representations, butthe day changes introduce discontinuities into the input data.

[0153] In FIG. 7B, the sine wave 410 is periodic and continuous, but itis not ambiguous. Events 411 and 412 occur at different times but haveidentical presentations of time. Assuming that event 411 was normal inthe morning, the ADS would not recognize a similar event as an anomalyif it occurred in the evening.

[0154]FIG. 7C shows three acceptable presentations of time. They are allbased on the idea that time is represented as a coordinate pair x,y. Thecircle 420 represents time as {x=sin(2πt/L); y=cos(2πt/L)} where L isthe length of the variation period, and 2πt/L is an angle from the xaxis. The ellipse 422 is also acceptable as long as it is not too flatto introduce an ambiguity as to whether a point is on the top half orthe bottom half of the ellipse. Even a rectangle 424 can be used.Although several points have identical x or y coordinates, no two pointsof the rectangle have identical x/y coordinate pairs.

[0155] The sine/cosine combination of the circle 420 is considered apreferred presentation of time because events which are equidistant intime are also equidistant in the presentation of time. However, thesine/cosine combination may be computationally intensive, and someapproximations, such as a pair of triangular wave functions with a90-degree phase shift, can be used.

[0156] As stated earlier, in some situations the presentation of timemay require more than one component. For example, there may be up tothree sine/cosine pairs with periods of 24 hours, one week and one year.

[0157] Although preferred embodiments of the invention have beendescribed in connection with neural networks and self-organizing maps,the invention is not limited to these examples. As an alternative, theinvention can be generalized to other clustering techniques such ask-means and Learning Vector Quantization, in which case the neurons arereplaced by codebook vectors.

[0158] This embodiment provides a method for teaching an anomalydetecting mechanism in a system comprising observable objects (302), atleast one of which has a periodic time-dependent behaviour, the anomalydetecting mechanism comprising a computerized learning mechanism (314)having an input space for defining input data consisting of input datacomponents (11, 12);

[0159] the method comprising:

[0160] assembling indicators (304) indicating the behaviour of theobservable objects (302) and arranging the assembled indicators suchthat each observable object's indicators are assigned to the same inputdata component;

[0161] teaching the learning mechanism (314) such that the input data ofthe learning mechanism comprises the input data components which arebased on the assembled indicators (304);

[0162] placing points (14) which approximate the input data in the inputspace;

[0163] incorporating a presentation of time (420-424) into at least oneinput data component (11, 12);

[0164] wherein the presentation of time (420-424) is periodic,continuous and unambiguous within the period (L) of the at least oneelement with periodic time-dependent behaviour.

[0165] In this method, the learning mechanism may be or comprise aself-organizing map.

[0166] The presentation of time may have a first period and at least onesecond period which is a multiple of the first period.

[0167] The input data components may be scaled such that each has thesame variance, preferably one.

[0168] The presentation of time preferably has a variable resolutionsuch that one bit corresponds to different units of time depending onthe changes in the time-dependent behaviour.

[0169] This aspect of the invention furthermore provides an arrangementfor detecting anomalies in a system comprising observable objects, atleast one of which has a periodic time-dependent behaviour;

[0170] the arrangement comprising:

[0171] a computerized learning mechanism having an input space fordefining input data consisting of input data components;

[0172] means for assembling indicators indicating the behaviour of theobservable objects and arranging the assembled indicators such that eachobservable object's indicators are assigned to the same input datacomponent;

[0173] means for teaching the learning mechanism such that the inputdata of the learning mechanism comprises the input data components whichare based on the assembled indicators;

[0174] means for placing points which approximate the input data in theinput space;

[0175] at least one input data component comprising a presentation oftime;

[0176] wherein the presentation of time is periodic, continuous andunambiguous within the period of the at least one element with periodictime-dependent behaviour.

[0177] The arrangement may be comprised in a single network element.

[0178] According to one aspect of the invention, a computer readablestorage medium is provided which comprises software for a computer,wherein executing the software in the computer causes the computer tocarry out all or part of the above mentioned method steps.

[0179] The above described method and apparatus are adapted for teachingan anomaly detecting mechanism in a system comprising observableobjects, at least one of which has a periodic time-dependent behaviour.The anomaly detecting mechanism comprises a computerized learningmechanism. The method comprises assembling indicators indicating thebehaviour of the elements and arranging the assembled indicators suchthat each observable object's indicators are assigned to the same inputdata component. The learning mechanism is taught so that the input dataof the learning mechanism comprises the input data components which arebased on the assembled indicators. Points which approximate the inputdata are placed in the input space. A presentation of time isincorporated into at least one input data component wherein thepresentation of time is periodic, continuous and unambiguous within theperiod of the at least one element with periodic time-dependentbehaviour.

[0180] The invention can also be used in other industry areas thantelecommunications and networks.

[0181] Although the invention has been described above with reference tospecific embodiments, the scope of the invention also covers anyalterations, additions, modifications, and omissions of the disclosedfeatures.

1. Method for monitoring the behaviour of at least one observable objectof a network, wherein at least one parameter of the observable object isrepeatedly detected, the at least one parameter is checked with regardto fulfilling predetermined criteria, a vector is formed based on themonitored parameter depending on the result of the checking step, andthe formed vector is evaluated for monitoring the behaviour of theobservable object.
 2. Method according to claim 1, wherein the formedvector is input to a learning process and to an analyzing process, thelearning process is forming a reference, based on the input vector and aprevious value of the reference or at least one previously input vector,for describing the behaviour of the observable object, and the analyzingprocess is comparing the input vector and the reference for detectinganomalous behaviour.
 3. Method according to claim 1 or 2, wherein thenumber of parameters that fulfil the predetermined criteria during anobservation period, is counted, for forming KPI values which form partof the vector, each parameter having its own criteria.
 4. Methodaccording to claim 3, wherein, when an RTT report value fulfils thepredetermined criteria, value of one KPI is increased by one.
 5. Methodaccording to claim 3, wherein KPI value is a function of at least onepredetermined RTT report field values of such RTT reports that fulfilthe predetermined criteria.
 6. Method according to claim 3, wherein KPIvalue is a function of at least one other KPI.
 7. Method according toany one of claims 1 to 6, wherein the predetermined criteria are checkedby comparing fields of an RTT report to predetermined field thresholds.8. Method according to claim 4, 5, 6 or 7, wherein an RTT reportcontaining the parameter values includes fields defining the end causecode, a length of call, sender identification, receiver identification,location information, used network resources, and/or used services. 9.Method according to claim 3, wherein the predetermined criteria and theobservation period are user definable.
 10. Method according to any oneof the preceding claims, wherein the vector comprises several valueswhich describe properties or functioning of the observable object. 11.Method according to any one of the preceding claims, wherein the vectoris formed based on detected values in RTT (Real Time Traffic) reports.12. Method according to claim 2, wherein the reference formed by thelearning process is a profile generated from at least two vectors. 13.Method according to any one of claims 2 to 12, wherein the learningprocess comprises a self organizing map (SOM).
 14. Method according toany one of claims 2 to 12, wherein the learning process comprises aK-means algorithm.
 15. Method according to any one of claims 2 to 14,comprising the steps: Using the vector as input data to an algorithmwhich learns the normal functioning of the network, and forms thereference as a profile describing the normal functioning of the network,the profile consisting of nodes or cluster centroids, Finding bestmapping nodes or cluster centroids of the profile for a new incomingvector; Counting distance between the new incoming vector and the bestmapping nodes or cluster centroids; Counting a distribution ofdistances; Checking from the distribution whether a distance of the newincoming vector represents a probability which is less than apredetermined probability set-up value; Generating an alarm, if thecheck result is positive.
 16. System for monitoring the behaviour of atleast one observable object of a network, comprising a detecting meansfor detecting at least one parameter of the observable object, means forchecking the at least one parameter with regard to fulfillingpredetermined criteria, means for forming a vector based on themonitored parameter depending on the result of the checking step, andmeans for evaluating the formed vector for monitoring the behaviour ofthe observable object.
 17. System according to claim 16, comprising alearning means for receiving an actually calculated vector and forming areference, based on at least two vectors, for describing the behaviourof the observable object, and an analyzing means for receiving theactually calculated vector and comparing the received vector and thereference for detecting anomalous behaviour.
 18. System according to anyone of the preceding system claims, wherein the parameter is formedbased on detected RTT (Real Time Traffic) report field values. 19.System according to any one of the preceding system claims, comprisingmeans for: Using the vector as input data to an algorithm which learnsthe normal functioning of the network, and forms, as a reference, aprofile describing the normal functioning of the network, the profileconsisting of nodes or cluster centroids; Finding best mapping nodes orcluster centroids of the profile for a new incoming vector; Countingdistance between the new incoming vector and the best mapping nodes orcluster centroids; Counting a distribution of distances; Checkingwhether a distance of an incoming vector has a probability value lessthan a predetermined probability set-up value; and Generating an alarm,if the check result is positive.
 20. System according to claim 17,wherein the learning means includes a self organizing map (SOM) orK-means algorithm.